The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation.
translated by 谷歌翻译
Graph Neural Networks (GNNs), originally proposed for node classification, have also motivated many recent works on edge prediction (a.k.a., link prediction). However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i.e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes. To this end, we propose a novel edge prediction paradigm named Edge-aware Message PassIng neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting technique to specify use of each edge where each edge is solely used as either the topology or the supervision (named as topology edge or supervision edge). We then develop a new message passing mechanism that generates the messages to source nodes (through topology edges) being aware of target nodes (through supervision edges). In order to emphasize the differences between pairs connected by supervision edges and pairs unconnected, we further weight the messages to highlight the relative ones that can reflect the differences. In addition, we design a novel negative node-pair sampling trick that efficiently samples 'hard' negative instances in the supervision instances, and can significantly improve the performance. Experimental results verify that the proposed method can significantly outperform existing state-of-the-art models regarding the edge prediction task on multiple homogeneous and heterogeneous graph datasets.
translated by 谷歌翻译
Knowledge tracing (KT) aims to leverage students' learning histories to estimate their mastery levels on a set of pre-defined skills, based on which the corresponding future performance can be accurately predicted. In practice, a student's learning history comprises answers to sets of massed questions, each known as a session, rather than merely being a sequence of independent answers. Theoretically, within and across these sessions, students' learning dynamics can be very different. Therefore, how to effectively model the dynamics of students' knowledge states within and across the sessions is crucial for handling the KT problem. Most existing KT models treat student's learning records as a single continuing sequence, without capturing the sessional shift of students' knowledge state. To address the above issue, we propose a novel hierarchical transformer model, named HiTSKT, comprises an interaction(-level) encoder to capture the knowledge a student acquires within a session, and a session(-level) encoder to summarise acquired knowledge across the past sessions. To predict an interaction in the current session, a knowledge retriever integrates the summarised past-session knowledge with the previous interactions' information into proper knowledge representations. These representations are then used to compute the student's current knowledge state. Additionally, to model the student's long-term forgetting behaviour across the sessions, a power-law-decay attention mechanism is designed and deployed in the session encoder, allowing it to emphasize more on the recent sessions. Extensive experiments on three public datasets demonstrate that HiTSKT achieves new state-of-the-art performance on all the datasets compared with six state-of-the-art KT models.
translated by 谷歌翻译
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
translated by 谷歌翻译
Temporal reasoning is the task of predicting temporal relations of event pairs with corresponding contexts. While some temporal reasoning models perform reasonably well on in-domain benchmarks, we have little idea of the systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates if systems can correctly understand the effect of incremental changes. Specifically, TODAY makes slight context changes for given event pairs, and systems need to tell how this subtle contextual change will affect temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning and encourage models to use more appropriate signals during training and outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3 and moves farther towards generic temporal reasoning systems.
translated by 谷歌翻译
Mixed-precision quantization has received increasing attention for its capability of reducing the computational burden and speeding up the inference time. Existing methods usually focus on the sensitivity of different network layers, which requires a time-consuming search or training process. To this end, a novel mixed-precision quantization method, termed CSMPQ, is proposed. Specifically, the TF-IDF metric that is widely used in natural language processing (NLP) is introduced to measure the class separability of layer-wise feature maps. Furthermore, a linear programming problem is designed to derive the optimal bit configuration for each layer. Without any iterative process, the proposed CSMPQ achieves better compression trade-offs than the state-of-the-art quantization methods. Specifically, CSMPQ achieves 73.03$\%$ Top-1 acc on ResNet-18 with only 59G BOPs for QAT, and 71.30$\%$ top-1 acc with only 1.5Mb on MobileNetV2 for PTQ.
translated by 谷歌翻译
A coreference resolution system is to cluster all mentions that refer to the same entity in a given context. All coreference resolution systems need to tackle two main tasks: one task is to detect all of the potential mentions, and the other is to learn the linking of an antecedent for each possible mention. In this paper, we propose a hybrid rule-neural coreference resolution system based on actor-critic learning, such that it can achieve better coreference performance by leveraging the advantages from both the heuristic rules and a neural conference model. This end-to-end system can also perform both mention detection and resolution by leveraging a joint training algorithm. We experiment on the BERT model to generate input span representations. Our model with the BERT span representation achieves the state-of-the-art performance among the models on the CoNLL-2012 Shared Task English Test Set.
translated by 谷歌翻译
This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset; put differently, the performance of the existing methods depends on the worst-case propensity in the offline dataset. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities for certain actions. In this paper, we propose a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which only depends on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class we optimize over. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized type concentration inequality for inverse-propensity-weighting estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data.
translated by 谷歌翻译
The target of a coreference resolution system is to cluster all mentions that refer to the same entity in a given context. All coreference resolution systems need to solve two subtasks; one task is to detect all of the potential mentions, and the other is to learn the linking of an antecedent for each possible mention. In this paper, we propose a reinforcement learning actor-critic-based neural coreference resolution system, which can achieve both mention detection and mention clustering by leveraging an actor-critic deep reinforcement learning technique and a joint training algorithm. We experiment on the BERT model to generate different input span representations. Our model with the BERT span representation achieves the state-of-the-art performance among the models on the CoNLL-2012 Shared Task English Test Set.
translated by 谷歌翻译
Most recent semantic frame parsing systems for spoken language understanding (SLU) are designed based on recurrent neural networks. These systems display decent performance on benchmark SLU datasets such as ATIS or SNIPS, which contain short utterances with relatively simple patterns. However, the current semantic frame parsing models lack a mechanism to handle out-of-distribution (\emph{OOD}) patterns and out-of-vocabulary (\emph{OOV}) tokens. In this paper, we introduce a robust semantic frame parsing pipeline that can handle both \emph{OOD} patterns and \emph{OOV} tokens in conjunction with a new complex Twitter dataset that contains long tweets with more \emph{OOD} patterns and \emph{OOV} tokens. The new pipeline demonstrates much better results in comparison to state-of-the-art baseline SLU models on both the SNIPS dataset and the new Twitter dataset (Our new Twitter dataset can be downloaded from https://1drv.ms/u/s!AroHb-W6_OAlavK4begsDsMALfE?e=c8f2XX ). Finally, we also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
translated by 谷歌翻译